智能论文笔记

HFedMS: Heterogeneous Federated Learning with Memorable Data Semantics in Industrial Metaverse

Shenglai Zeng , Zonghang Li , Hongfang Yu , Zhihao Zhang , Long Luo , Bo Li , Dusit Niyato

分类：机器学习 | 人工智能

2022-11-07

Federated Learning (FL), as a rapidly evolving privacy-preserving collaborative machine learning paradigm, is a promising approach to enable edge intelligence in the emerging Industrial Metaverse. Even though many successful use cases have proved the feasibility of FL in theory, in the industrial practice of Metaverse, the problems of non-independent and identically distributed (non-i.i.d.) data, learning forgetting caused by streaming industrial data, and scarce communication bandwidth remain key barriers to realize practical FL. Facing the above three challenges simultaneously, this paper presents a high-performance and efficient system named HFEDMS for incorporating practical FL into Industrial Metaverse. HFEDMS reduces data heterogeneity through dynamic grouping and training mode conversion (Dynamic Sequential-to-Parallel Training, STP). Then, it compensates for the forgotten knowledge by fusing compressed historical data semantics and calibrates classifier parameters (Semantic Compression and Compensation, SCC). Finally, the network parameters of the feature extractor and classifier are synchronized in different frequencies (Layer-wiseAlternative Synchronization Protocol, LASP) to reduce communication costs. These techniques make FL more adaptable to the heterogeneous streaming data continuously generated by industrial equipment, and are also more efficient in communication than traditional methods (e.g., Federated Averaging). Extensive experiments have been conducted on the streamed non-i.i.d. FEMNIST dataset using 368 simulated devices. Numerical results show that HFEDMS improves the classification accuracy by at least 6.4% compared with 8 benchmarks and saves both the overall runtime and transfer bytes by up to 98%, proving its superiority in precision and efficiency.

translated by 谷歌翻译

The NLP Sandbox: an efficient model-to-data system to enable federated and unbiased evaluation of clinical NLP models

Yao Yan , Thomas Yu , Kathleen Muenzen , Sijia Liu , Connor Boyle , George Koslowski , Jiaxin Zheng , Nicholas Dobbins , Clement Essien , Hongfang Liu

分类：自然语言处理 | 人工智能

2022-06-28

目的是对临床文本去识别的自然语言处理（NLP）模型的评估取决于临床注释的可用性，临床注释通常由于隐私问题而受到限制。 NLP沙盒是一种通过采用联合模型到数据的方法来减轻NLP模型缺乏数据和评估框架的方法。这使得无偏见的联合模型评估无需共享多个机构的敏感数据。材料和方法我们利用Synapse协作框架，容器化软件和OpenAPI Generator来构建NLP沙盒（NLPSANDBOX.IO）。我们使用来自三个机构的数据评估了两个最先进的NLP去识别注释模型Philter和Neuroner。我们使用来自外部验证站点的数据进一步验证了模型性能。结果我们通过去识别临床模型评估证明了NLP沙箱的有用性。外部开发人员能够将其模型纳入NLP沙盒模板中，并提供用户体验反馈。讨论我们证明了使用NLP沙箱对临床文本去识别模型进行多站点评估的可行性，而无需共享数据。标准化模型和数据模式可以使模型传输和实现平稳。为了概括NLP沙箱，数据所有者和模型开发人员需要进行工作，以开发合适和标准化的模式，并调整其数据或模型以适合模式。结论NLP沙箱降低了利用临床数据进行NLP模型评估的障碍，并促进了联合会的NLP模型的联合，多站点，无偏见的评估。

translated by 谷歌翻译

Cross-Silo Heterogeneous Model Federated Multitask Learning

Xingjian Cao , Zonghang Li , Hongfang Yu , Gang Sun

分类：机器学习

2022-02-17

联合学习（FL）是一种机器学习技术，它使参与者能够在不交换私人数据的情况下协作培训高质量的模型。利用跨索洛FL（CS-FL）设置的参与者是具有不同任务需求的独立组织，他们不仅关心数据隐私，而且由于知识产权的考虑而独立培训其独特的模型。大多数现有的FL方法无法满足上述方案。在本文中，我们提出了一种基于未标记数据的伪标记的FL方法，该方法是通过诸如辅助的过程。据我们所知，这是第一种与异质任务，异质模型和异质培训算法同时兼容的第一种FL方法。实验结果表明，所提出的方法比竞争能力更好。对于非独立和相同分布的（IID）设置和异质模型而言，尤其如此，其中提出的方法可实现35％的性能提高。

translated by 谷歌翻译

Discrimination, calibration, and point estimate accuracy of GRU-D-Weibull architecture for real-time individualized endpoint prediction

Xiaoyang Ruan , Liwei Wang , Michelle Mai , Charat Thongprayoon , Wisit Cheungpasitporn , Hongfang Liu

分类：机器学习

2022-12-19

Real-time individual endpoint prediction has always been a challenging task but of great clinic utility for both patients and healthcare providers. With 6,879 chronic kidney disease stage 4 (CKD4) patients as a use case, we explored the feasibility and performance of gated recurrent units with decay that models Weibull probability density function (GRU-D-Weibull) as a semi-parametric longitudinal model for real-time individual endpoint prediction. GRU-D-Weibull has a maximum C-index of 0.77 at 4.3 years of follow-up, compared to 0.68 achieved by competing models. The L1-loss of GRU-D-Weibull is ~66% of XGB(AFT), ~60% of MTLR, and ~30% of AFT model at CKD4 index date. The average absolute L1-loss of GRU-D-Weibull is around one year, with a minimum of 40% Parkes serious error after index date. GRU-D-Weibull is not calibrated and significantly underestimates true survival probability. Feature importance tests indicate blood pressure becomes increasingly important during follow-up, while eGFR and blood albumin are less important. Most continuous features have non-linear/parabola impact on predicted survival time, and the results are generally consistent with existing knowledge. GRU-D-Weibull as a semi-parametric temporal model shows advantages in built-in parameterization of missing, native support for asynchronously arrived measurement, capability of output both probability and point estimates at arbitrary time point for arbitrary prediction horizon, improved discrimination and point estimate accuracy after incorporating newly arrived data. Further research on its performance with more comprehensive input features, in-process or post-process calibration are warranted to benefit CKD4 or alike terminally-ill patients.

translated by 谷歌翻译

Wastewater Pipe Rating Model Using Natural Language Processing

Sai Nethra Betgeri , Shashank Reddy Vadyala , Dr. John C. Mattews , Dr. Hongfang Lu

分类：机器学习

2022-02-22

Closed-circuit video (CCTV) inspection has been the most popular technique for visually evaluating the interior status of pipelines in recent decades. Certified inspectors prepare the pipe repair document based on the CCTV inspection. The traditional manual method of assessing sewage structural conditions from pipe repair documents takes a long time and is prone to human mistakes. The automatic identification of necessary texts has received little attention. By building an automated framework employing Natural Language Processing (NLP), this study presents an effective technique to automate the identification of the pipe defect rating of the pipe repair documents. NLP technologies are employed to break down textual material into grammatical units in this research. Further analysis entails using words to discover pipe defect symptoms and their frequency and then combining that information into a single score. Our model achieves 95.0% accuracy,94.9% sensitivity, 94.4% specificity, 95.9% precision score, and 95.7% F1 score, showing the potential of the proposed model to be used in large-scale pipe repair documents for accurate and efficient pipeline failure detection to improve the quality of the pipeline. Keywords: Sewer pipe inspection, Defect detection, Natural language processing, Text recognition

translated by 谷歌翻译

RxWhyQA: a clinical question-answering dataset with the challenge of multi-answer questions

Sungrim Moon , Huan He , Hongfang Liu , Jungwei W. Fan

分类：自然语言处理

2022-01-07

目标为可以处理多答题问题的临床问答（QA）系统的开发和评估创建数据集。我们利用2018年国家NLP临床挑战（N2C2）语料库的注释关系来产生QA数据集。 1-0和1-o-n药物 - 理性关系形成了不可批售和多答案的条目，它代表了现有临床QA数据集缺乏的具有挑战性的情景。结果结果rxwhyqa dataSet包含91,440个QA条目，其中一半是未签发的，并且应答的21％（n = 19,269）需要多个答案。数据集符合社区审查的斯坦福问题应答DataSet（Squad）格式。讨论RXWhyQA对于比较需要处理零和多答案挑战的不同系统非常有用，要求对误报和假阴性答案的双重缓解。结论我们创建并共用了一个临床QA数据集，重点是多答题问题，以代表真实世界的情景。

translated by 谷歌翻译

An Open Natural Language Processing Development Framework for EHR-based Clinical Research: A case demonstration using the National COVID Cohort Collaborative (N3C)

Sijia Liu , Andrew Wen , Liwei Wang , Huan He , Sunyang Fu , Robert Miller , Andrew Williams , Daniel Harris , Ramakanth Kavuluru , Mei Liu

分类：自然语言处理

2021-10-20

虽然我们注意临床自然语言处理（NLP）的最新进展，但我们可以注意到临床和翻译研究界的一些抵抗，因为透明度，可解释性和可用性有限，采用NLP模型。在这项研究中，我们提出了一种开放的自然语言处理开发框架。我们通过实施NLP算法为国家Covid队列协作（N3C）进行了评估。基于Covid-19相关临床笔记的信息提取的利益，我们的工作包括1）使用Covid-19标志和症状作为用例的开放数据注释过程，2）一个社区驱动的规则集合平台，3）合成文本数据生成工作流程，用于生成信息提取任务的文本而不涉及人为受试者。 Corpora来自来自三个不同机构的文本（Mayo Clinic，肯塔基州大学，明尼苏达大学）。用单个机构（Mayo）规则集进行了金标准注释。这导致了0.876,0.706和0.694的F-Scors分别用于Mayo，Minnesota和肯塔基测试数据集。作为N3C NLP子群体的联盟努力的研究表明，创建联邦NLP算法开发和基准测试平台的可行性，以增强多机构临床NLP研究和采用。虽然我们在这项工作中使用Covid-19作为用例，但我们的框架足以适用于临床NLP的其他兴趣领域。

translated by 谷歌翻译

I2F: A Unified Image-to-Feature Approach for Domain Adaptive Semantic Segmentation

Haoyu Ma , Xiangru Lin , Yizhou Yu

分类：计算机视觉

2023-01-03

Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.

translated by 谷歌翻译

StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles

Yifeng Ma , Suzhen Wang , Zhipeng Hu , Changjie Fan , Tangjie Lv , Yu Ding , Zhidong Deng , Xin Yu

分类：计算机视觉

2023-01-03

Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.

translated by 谷歌翻译

Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling

Penghao Wu , Li Chen , Hongyang Li , Xiaosong Jia , Junchi Yan , Yu Qiao

分类：计算机视觉

2023-01-03

Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.

translated by 谷歌翻译